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Abstract 


This manuscript presents the results from a study to investigate the technical characteristics of two versions of a number 
line assessment (NLA 0-20 and NLA 0-100). The sample consisted of 60 kindergarten and 46 first grade students. Both 
number line versions had sufficient alternate form and test-retest reliability. The NLA 0-20 had low and the NLA 0-100 
had low to moderate correlations with math achievement. Results indicated that the NLA 0-100 explained a small, but 
unique portion of the variance in first grade mathematics performance when controlling for performance on the Assessing 
Student Proficiency in Early Number Sense (ASPENS) a set of early numeracy screening measures. We discuss study results 
related to the utility of adding number line assessment tasks to mathematics screening batteries and propose additional 


areas of research. 
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Over the past two decades, there has been significant inter- 
est in improving the performance of our nation’s children 
in the area of mathematics (National Mathematics Advisory 
Panel, 2008; National Research Council, 2001). Interest 
driven, in part, by low levels of mathematics performance. 
Recent National Assessment of Educational Progress 
(National Assessment of Education Progress (NAEP) data 
indicate overall low levels of performance at fourth grade 
with only 40% of students being classified as at or above 
proficiency. Findings are of greater concern for students 
from minority populations, low socioeconomic  back- 
grounds, English learners, and students with disabilities 
with a range of 14% to 26% being classified as at or above 
proficiency. NAEP longitudinal trends (2015) also indicate 
that performance levels have remained relatively stagnant 
since 2007 after a prolonged period of sustained improve- 
ment. To address long standing concerns with mathematics 
achievement, a focus on early intervention seems war- 
ranted given findings from multiple longitudinal studies 
(Duncan et al., 2007; Morgan, Farkas, & Wu, 2009; 
Morgan, Hillemeier, Farkas, & Maczuga, 2014). Investi- 
gations of mathematics development have demonstrated 
strong relationships between early mathematics risk sta- 
tus at kindergarten and fifth grade (Morgan et al., 2009) 
and documented that students who exited kindergarten 
at-risk in mathematics were 17 times more likely to have 


continuous and persistent difficulties in mathematics 
through late elementary and middle school (Morgan et al., 
2014). Thus, in the absence of targeted efforts, it is likely 
that students with early difficulties in mathematics will 
continue to struggle as they encounter more advanced 
mathematics (Jordan, Glutting, & Ramineni, 2010). 
Despite recognition of the importance of mathematical 
knowledge and its acquisition as a fundamental goal of 
schooling, systematic efforts to increase mathematics 
achievement are limited. One proposed framework for 
increasing student achievement across academic areas is the 
use of Multitier Systems of Support (MTSS) or Response to 
Intervention (RTI) service delivery models with a focus on 
the prevention of academic difficulties (Lembke, McMaster, 
& Stecker, 2010). However, a recent analysis found that 
while over 70% of schools reported the use of MTSS or RTI 
frameworks to support the acquisition of early literacy skills 
only 35% reported similar support in mathematics (Balu 
et al., 2015). Results in this vein are not surprising given the 
persistent lack of time spent on mathematics (La Paro et al., 
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2009), the systematic focus on reading instruction as the 
primary charge in the early elementary grades (Clarke, 
Doabler, & Nelson 2014), and the complexity of translating 
research into practice for complex RTI procedures in math- 
ematics such as data-based individualization to intensity 
instruction (Schumacher, Edmonds, & Arden, 2017). 

Efforts to develop parallel systems in mathematics are 
dependent upon advances in key components of RTI sys- 
tems (Gersten et al., 2009) and while system-wide supports 
targeting mathematics achievement in school are not as 
common place as in reading, within the last decade, 
researchers have made significant advances in two key 
areas: screening for risk status and the delivery of evidence- 
based interventions (Fuchs, Fuchs, & Compton, 2012). 
There have several intervention programs in mathematics 
developed and validated for use in the early elementary 
grades (e.g., Clarke et al., 2014, 2016; Dyson, Jordan, & 
Glutting, 2013; Fuchs et al., 2005; Sood & Jitendra, 2013) 
that, as called for by experts, target the development of 
number sense and whole number understanding (Author 
et al., 2009; Frye et al., 2013). In parallel, researchers have 
also focused on building corresponding screening systems 
to identify students in need of additional support (see Fuchs 
et al., 2007, and Gersten et al., 2012, for comprehensive 
reviews). The purpose of this study was to investigate 
potential new approaches to screening for risk status in 
early mathematics. 

Typically, screening measures in early mathematics are 
developed with curriculum-based measurement (CBM) 
design parameters (Deno, 1985) that emphasize measures 
not only demonstrate strong psychometric properties 
including the capacity to model student growth but that they 
are simple, efficient, and easily understood. Measures are 
designed as general outcome measures with a focus on 
assessing a student’s overall understanding of mathematics. 
Measures commonly used in early mathematics screening 
batteries can be considered to fall into one of two camps. 
The first set assesses readiness skills such as engaging in 
rote counting or identifying numerals. The second set is 
more focused on assessing key mathematics concepts 
(Clarke, Gersten, Dimino, & Rolfhus, 2011). For example, 
measures focused on student understanding of magnitude 
(i.e., comparing two numerals and identifying the lesser or 
greater magnitude) and the ability to engage in strategic 
counting (i.e., identifying the missing numeral from a 
sequence of numerals) have consistently shown promise for 
identifying students at-risk in mathematics (Author et al., 
2012; Foegen, Jiban, & Deno, 2007; Fuchs et al., 2007). 

The relative success of magnitude comparison and stra- 
tegic counting as screeners may be due to their ability to tap 
into the development of a mental number line (Berch, 
2005). Number line development enables children to engage 
in a range of mathematics tasks as they link their informal 
understanding of number and beginning number sense to 


the formal system of numbers taught in schools (Gersten & 
Chard, 1999). For example, students may use a mental 
number line when applying counting strategies to solve 
addition and subtraction problems. The use of a mental 
number line is also critical to tasks related to understanding 
numerical magnitudes which encapsulates a student’s abil- 
ity to “comprehend, estimate, and compare the sizes of 
numbers” (Fazio, Bailey, Thompson, & Siegler, 2014). 
Researchers have theorized that the mental number line 
operates as the primary central conceptual structure (Case 
et al., 1996) by which student’s organize and integrate new 
information into their understanding of number systems 
(Siegler, Thompson, & Schneider, 2011) making number 
line performance a potential valuable mechanism by which 
to identify at-risk status. 

Several researchers (e.g., Booth & Siegler, 2006; Geary, 
Hoard, Nugent, & Byrd-Craven, 2008; Laski & Siegler, 
2007) have attempted to assess the development of a mental 
number line and its role in understanding magnitude through 
a number line estimation task. This task requires students to 
map a given numerical value (typically an Arabic numeral, 
but also nonsymbolic magnitude representations) onto a 
horizontal number line. Traditionally, the number line is 
labeled with two defined end points, and the student is 
asked to indicate where a value lies between these points 
(e.g., placing 5 on a number line with endpoints of 0 and 
20). Estimation error is calculated by comparing the indi- 
vidual’s placement of the numerical value to the actual loca- 
tion of the target value on the number line. 

More accurate placement of numbers on the number line 
estimation task is related to higher math achievement, 
greater proficiency in solving arithmetic problems, and bet- 
ter performance on magnitude comparison tasks (Booth & 
Siegler, 2006; Fazio et al., 2014; Laski & Siegler, 2007). In 
addition, typically achieving students perform better on the 
number line estimation task compared to students with 
mathematics learning disabilities (Geary et al., 2008). As a 
student progresses in their schooling, they continue to refer- 
ence a mental number line as they move onto more chal- 
lenging tasks such as comparing the magnitude of fractions 
(Siegler et al., 2011). Jordan and colleagues (2013) found 
that even when accounting for variables such as fact fluency 
or working memory, third grade number line task perfor- 
mance was the stronger predictor of fourth grade mathemat- 
ics achievement including fraction understanding. Such 
results support the theory that number line tasks tap into a 
broader understanding of number and that number line tasks 
can be used to detect difficulties as students transition from 
working with the whole number system to the rational num- 
ber system (Siegler et al., 2011). The relation between stu- 
dent performance on number line estimation tasks and 
mathematics achievement points to its potential utility as a 
screening tool for students with mathematics difficulties. 
By assessing students’ mental number line in a more direct 
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manner that differs from current mathematics screening 
practice, we may, increase the probability of accurately 
detecting early mathematics difficulties. 


Purpose and Research Questions 


Our orientation to this study was based on the field’s current 
practice of using CBM to efficiently screen for mathematics 
risk status thus we chose to approach the work by investi- 
gating whether number line estimation measures added 
value to the screening process when added to a set of com- 
monly used early numeracy CBM screeners. To date, no 
studies have looked at the inclusion of a number line esti- 
mation task within a standard CBM early numeracy screen- 
ing battery. Due to increased interest in investigating 
constructs associated with magnitude, new formats to 
investigate those constructs, and interest in improving early 
numeracy screening systems (Author et al., 2011b), this 
study specifically focused on answering two questions 
related to the potential use of two iPad number line assess- 
ments (NLAs), one spanning 0-20 (NLA 0-20) and one 
spanning 0-100 (NLA 0-100): 


1. What are the psychometric properties, including 
alternate form and test-retest reliability and concur- 
rent and predictive validity by kindergarten and first 
grade, of two 1Pad administered number line assess- 
ments (NLA 0-20 and NLA 0-100)? 

2. To what extent does performance on a number line 
assessment (NLA 0—20 and NLA 0-100) add incre- 
mental validity to a battery of early numeracy cur- 
riculum based measures (ASPENS) in predicting 
student mathematics achievement (Easy CBM)? 


Method 


Participants 


The study was conducted in a mid-sized public school dis- 
trict in the Pacific Northwest during a 5-week, district- 
sponsored summer school program. The district enrolls 
1,817 K-3 students. Of those, 77% are economically disad- 
vantaged, 18% have identified disabilities, and 10% are 
English learners. Within-year mobility is 14.4%. The major 
of students are white (67%), with 21% Hispanic/Latino, 
7% multiracial, 2% Black/African American, 2% Asian, 
1% American Indian/Alaska Native, and less than 1% 
Native Hawaiian/Pacific Islander. The program operated 2 
hr per day, 4 days per week and provided free breakfast and 
lunch to participants. It served exiting kindergarten and 
first grade students determined by the district to be at-risk 
in reading. At-risk status was determined for kindergarten 
students if they fell below the 30th percentile on an 
EasyCBM Word Reading Fluency measure and for first 
grade students if they fell below the 20th percentile on an 


EasyCBM Passage Reading Fluency measure. Reading 
instruction was the major content focus however the dis- 
trict also provided math instruction to all students for 30 
min per day through the use of an individualized computer 
delivered program, NumberShire, focused on review and 
practice with whole number concepts. Those students who 
had attended kindergarten and first grade the previous year 
(n = 134 “outgoing” or “exiting” students) were invited to 
participate in the study. A number of families opted out 
(n = 14) or were not present during testing (n = 14), thus 
our sample included 106 students (60 exiting first graders, 
46 exiting kindergarteners). 


Design 


We examined within- and across-student performance on 
assessments administered by the research team at the begin- 
ning and end of summer school and standardized measures 
administered by the district (i.e., at the end of the previous 
school year and beginning of the following school year). 


Measures 


We administered two iPad adaptations of the number line 
estimation task (VLA 0-20 and NLA 0-100) and two curric- 
ulum-based assessments (Assessing Student Proficiency in 
Early Number Sense and EasyCBM). Accommodations 
were not provided as part of the administration procedures. 


NLA 0-20 and NLA 0-100. The general number line task 
first appeared for such uses in the literature in 2003 (Siegler 
& Opfer), but has been adapted for different ages, pur- 
poses, and presentation formats (Strand Cary, Laski, Shan- 
ley, & Clarke, 2014). For this study, we developed the NLA 
0-20 task and NLA 0-100 task (administered through 
iPads and modeled after a 26-item paper/pencil number 
line task by Laski (2013). During the task, students were 
presented with a horizontal line on the iPad screen while 
the iPad app (i.e., a female voice) explained the concept of 
the number line and that one end represents 0 and the other 
represents 20 (in the case of the NLA 0-20) or 100 (in the 
case of the NLA 0-100). Students practiced placing these 
numbers and received affirmative verbal feedback (e.g., 
“Right! Zero goes here on the number line.”) or corrective 
verbal and visual feedback (e.g., “That spot belongs to 
another number. Zero goes HERE on the number line”; 
“Close, but that spot belongs to another number. Twenty 
goes HERE on the number line.”). The spot was marked by 
the number presented sliding to the appropriate spot on the 
number line. 

Students then placed randomly presented target num- 
bers on the number line. Students’ responses and the time it 
took them to hit the “submit” button were logged by the 
app. The app also calculated item-specific errors (e.g., if 
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the number was 4 and the student placed it at 10.25, the 
error would be + 6.25; if the student placed it at 1.73, the 
error would be —2.27). The resulting data file included 
information about students’ performance on the sample 
items as well as each target number, associated error, and 
submission time, as well as cumulative and mean absolute 
error (1.e., the main outcome variable). Minimum scores 
for both NLAs were 0 and maximum absolute scores were 
15 (mean) and 240 (cumulative) for the NLA 20 and 77.58 
(mean) and 2017 (cumulative). Responses were not flagged 
for any particular response pattern (e.g., quick and ran- 
dom). Overall test time (approximately 3.5 min for the NL 
0-20 and 5 min for the NL 0-100 task). During the NLA 
0-20 task, students were presented with most numbers 
between | and 19. To avoid giving students clear “anchors” 
during the task, 5, 10, and 15 were not presented. For pur- 
poses of this study, two forms of the NLA 0-20 were uti- 
lized. The first form (20a) presented 2, 4, 7, 8, 11, 13, 16, 
and 19 randomly in the first half of the assessment, then 1, 
3, 6, 9, 12, 14, 17, and 18 randomly in the second half. The 
second form (20b) reversed the halves. During the NLA 
0-100 task, 26 numbers (the same as those used in Laski, 
2013) were presented. 


Assessing Student Proficiency in Early Number Sense (ASPENS; 
Sopris). The first grade ASPENS assessment consists of 
three, 1- to2-min timed, individually administered measures 
that assess the ability to compare two numerals and deter- 
mine which is greater (Magnitude Comparison) identify the 
missing numeral in a string of three numerals (Missing 
Number) and solve simple addition and subtraction compu- 
tation problems that cross 10 (Basic Arithmetic Facts and 
Base 10; Clarke et al., 2011). Subtest scores are calculated 
and weighted to form an overall ASPENS composite score. 
We administered the first grade winter version at the begin- 
ning of summer school and the spring version at the end. 
The authors report test-retest reliability ranging from .71 to 
.90. Concurrent and predictive validity with the TerraNova 
Third Edition is reported as ranging from .57 to .63 and as 
.63 respectively. 


EasyCBM. EasyCBM is a mathematics measure that empha- 
sizes conceptual understanding over basic computation and 
is based on the Common Core State Standards for Mathe- 
matics (Anderson, Alonzo, & Tindal, 2010; Common Core 
State Standards Initiative, 2010). Each individualized math 
assessment is computer-administered and contains 30 (kin- 
dergarten) or 35 (first grade) items. For first grade, the mea- 
sures exhibit strong internal consistency (Cronbach’s alpha 
from .78 to .89) and concurrent validity (correlation of .73 
with the TerraNova; Anderson et al., 2010). The Easy CBM 
system generates percentile scores and raw scores. We used 
percentile scores to provide additional contextual informa- 
tion regarding the study sample. 


Assessment Procedures 


All procedures were approved by the participating district 
and the University’s institutional review board (IRB). 
Parent information letters with opt-out postcards were 
mailed 2 weeks before the study start date and student 
assent was procured during the initial test session. The week 
before pretesting, a nine-person data collection team com- 
prising five seasoned university assessors and four new- 
hires was trained to administer the ASPENS and number 
line tasks. The 2-hr training included training in administra- 
tion logistics (e.g., counterbalancing, technical trouble- 
shooting), practice assessments and in-training reliability 
(i.e., fidelity of administration) for the ASPENS. A standard 
of 90% was required to be considered reliable, but retrain- 
ing tol00% reliability for all data collectors took place in 
the days following the training. 

Initial assessments were administered within the first 
week of summer school. The project coordinator (1.e., a vet- 
eran data collector) shadow scored assessors’ first ASPENS 
administrations and conducted informal, in-field observa- 
tions to verify in-field reliability of administration for both 
the ASPENS and number line tasks. Students first com- 
pleted ASPENS, then both number line assessments; the 
entire testing battery took 15 to 20 min per student. Random 
assignment was used to determine whether students first 
completed the NLA 0-20 or NLA 0-100, as well as which 
of the two NLA 0-20 forms they completed (i.e., 20a or 
20b). The same data collection team administered final 
assessments a few days before the end of summer school. 
Half of the students started with ASPENS and the other half 
started with NLA tasks. At posttest, students completed the 
NLA 0-20 and NLA 0-100 in the opposite order (and form) 
from their initial assessment (i.e., if they completed pretest- 
ing in the order NLA 0—100/NLA 0-20b, post testing would 
be NLS 0-20a / NLA 0-100). ASPENS tests were scanned 
and scored by Teleform. Number line assessments were 
scored and saved to .csv files by the app itself. The district 
provided spring EasyCBM scores (of the school year pre- 
ceding the study) and fall EasyCBM scores (of the school 
year following the study) ina .csv file. EasyCBM is used by 
the district beginning in Grade 1 (thus pretest data is not 
available for our exiting kindergarten sample). 


Statistical Analyses 


Univariate descriptive analyses were performed on all mea- 
sures of number line knowledge and mathematics achieve- 
ment. Pearson’s r correlation coefficients were used to 
examine the relationships among the study variables. 
Reliability coefficients and relevant correlations were exam- 
ined to evaluate the psychometric properties of the NLA 
0-20 and NLA 0-100. Then, hierarchical multiple linear 
regression models were generated to address our second 
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research question. Namely, we tested whether NLA 0-100 
scores predicted EasyCBM percentile rank scores above and 
beyond ASPENS composite scores for outgoing kindergarten 
and first grade students. The NLA 0-20 scores were not 
tested due to low correlations with other math measures used 
in the study. The hierarchical multiple linear regression mod- 
els analyzed here included two steps. In the first step, 
EasyCBM percentile rank scores were regressed on ASPENS 
composite scores. Then, the NLA 0-100 scores were added 
to the model to determine the extent to which NLA 0-100 
scores were able to explain additional variance in mathemat- 
ics achievement percentile rank. In the model results, we 
reported the I and R° san , Statistics to describe the pro- 
portion of variance in the dependent variable captured by the 
independent variables in each block. Basic descriptive analy- 
ses were conducted using SPSS 20.0 for Mac OS (IBM Corp, 
2011) and all subsequent models were investigated using the 
maximum likelihood estimation in Mplus 7.1 (Muthén & 
Muthén, 2013). 


Results 


Descriptive statistics and correlations between the study 
variables are displayed in Table 1. Across both grade level 
samples, correlations between all variables ranged from .01 
to .94. As expected, all of the mathematics screening sub- 
tests (ASPENS) were positively correlated with one another 
in both the kindergarten and first grade samples. Greater 
range in ASPENS subtest correlations were observed in the 
kindergarten sample, r = .21—.85, as compared to the first 
grade sample, r = .61—.83. Whereas the NLA 0-20 scores 
demonstrated few statistically significant correlations with 
other measures, the NLA 0-100 was positively associated 
with a number of other measures in both samples, r = 
01-48. 


Reliability 


Test—retest reliability for the NLA 0-100 was measured at 
.72 and .70 in kindergarten and first grade, respectively. 
Alternate form reliability was assessed for the NLA 0-20 
and range from .57 — .61. Given questions about the accu- 
racy of Cohen’s alpha for accurately measuring internal 
consistency (Sijtsma, 2009), \, values were estimated in 
addition to Cohen’s alpha at each administration occasion. 
Across samples, NLA 0-100 internal consistency ranged 
from a = .88-.90, d, = .88-.90; NLA 0-20 internal consis- 
tency ranged from a = .83-.93, A, = .84-.94. 


Incremental Validity 


Together, ASPENS composite scores and the NLA 0-100 
performance explained 17% of the variance in mathematics 
achievement percentile rank for the kindergarten sample, 


F(2, 35) = 3.48, p < .05; however, neither measure was 
Statistically significantly associated with mathematics per- 
centile rank in the kindergarten sample. Although not statis- 
tically significant, the NLA 0-100 explained an additional 
7% of the variance in mathematics percentile rank. Results 
of the hierarchical linear regression analyses for first grade 
(see Table 2) indicated that the addition of the NLA 0-100 
significantly improved the prediction of mathematics 
achievement percentile rank. ASPENS composite scores, 
and NLA 0-100 performance were statistically significantly 
associated with mathematics percentile rank in the first 
grade sample, F(2, 37) = 17.29, p < .001. Together the 
measures explained 49% of the variance in mathematics 
achievement percentile rank for the first grade sample, and 
NLA 0-100 scores uniquely explained 13% of the variance 
in the first grade sample, AF(1, 36) = 8.47, p < .05. 


Discussion 


Our work in investigating the NLA was guided by an orien- 
tation toward the use of CBM like measures that requires 
considerations related to efficiency of use for screening, the 
capacity to model growth for progress monitoring, and a 
reflection of current best practices in early mathematics 
screening (Author et al., 2012). Because these consider- 
ations guided our thinking, we investigated the utility of the 
NLA when added to a standard early numeracy screening 
battery. Thus, rather than studying the technical characteris- 
tics of the NLA in isolation, we examined if the inclusion of 
the NLA would explain additional variance in math achieve- 
ment to such an extent that the expenditure of additional 
resources and time to collect number line data were justi- 
fied. Results indicate low to moderate (Salvia & Ysseldyke, 
2004) alternate form and test-retest reliability and stronger 
internal consistency reliability. Validity correlations were 
highly variable with stronger results at first grade compared 
to kindergarten and the NLA 0-100 measure demonstrating 
more statistically significant correlations. In addition, the 
NLA 0-100 measure explained an additional 7% of the 
variance on the kindergarten criterion measure and a statis- 
tically significant 13% of additional variance on the first 
grade criterion measure. Given the additional time required 
to administer the NLA measures (approx. 3.5—5 min) rela- 
tive to the additional variance explained in student mathe- 
matics performance and the need for caution in interpreting 
results of correlational studies (Thompson, Diamond, 
McWilliam, Snyder, & Snyder, 2005), the results from this 
study do not support a change in current practice but they do 
raise several interesting questions and directions for future 
research. 

Approaches to validating assessments typically include 
initially working with a sample of students across the skill 
spectrum. A limitation of this study was that the sample was 
a sample of convenience and consisted of students who 
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Table 2. Hierarchical Regression Analysis Predicting Fall 
EasyCBM Percentile With Time 2 ASPENS Composite Scores 
and NLA 0-100 for First Grade Sample (N = 39). 


Step Predictor R? AR? B SEB B 
| Intercept 36 — 7.10 4.41 

ASPENS composite 0.68 0.13 0.60*"* 
2 Intercept AQ 13 16.79 11.02 

ASPENS composite 0.74 0.17 0.56%" 


100 number line -0.62 0.32 -0.25* 


Note. CBM = curriculum-based measurement; ASPENS = Assessing 
Student Proficiency in Early Number Sense; NLA = number line 
assessment. 

*p < 05. *p < 01. ***p < .001. 


were receiving services due to being identified as at-risk in 
reading. The sample was lower performing in mathematics 
with average math scores at the 17th (kindergarten) and 
25th (first grade) percentile. The composition of the sample, 
students eligible for summer school services, may have 
contributed to lower correlations than in previous studies of 
early numeracy measures and affected the amount of vari- 
ance accounted for in the regression models. Given the rela- 
tive complexity of the NLA, results may have differed if a 
broader set of students had been assessed. Future research 
should examine the NLA with greater sample sizes, a wider 
range of student performance, and include data points from 
across the school year rather than within a limited window 
during the summer. In addition, the study focused on the 
value of the NLAs when added to a specific screening bat- 
tery, ASPENS, future work should include a range of early 
numeracy screeners. The use of an iPad and the construc- 
tion of the task required some degree of fine motor skill and 
thus fine motor skills and other factors such as hand-eye 
coordination may have been confounding factors affecting 
student performance. Finally, we were not able to provide 
sample specific demographics limiting the generalizability 
of the findings. 

The NLA measures used in the current study aligns most 
closely to the number line task used by Siegler and col- 
leagues, which they hypothesize measures students’ knowl- 
edge of number magnitude. Using this task, Siegler and 
colleagues (Booth & Siegler, 2006; Fazio et al., 2014; 
Laski & Siegler, 2007; Siegler & Opfer, 2003) found that 
as students develop a greater understanding of magnitude 
their placement of numbers on a number line transitions 
from a logarithmic (1.e., estimates exaggerate differences 
between smaller numbers and compress differences 
between larger numbers) to linear representation (i.e., dif- 
ferences are neither exaggerated or compressed). This log- 
arithmic to linear shift has been demonstrated across 
several studies and replicated across grade levels when 
varying degrees of difficulty are introduced (e.g., Berteletti, 
Lucangeli, Piazza, Dehaene, & Zorzi, 2010; Geary et al., 


2008; Siegler & Booth, 2004). This shift from logarithmic 
to linear estimates on the number line task is hypothesized 
by some researchers to reflect a shift toward a greater 
understanding of number and magnitude (e.g., Siegler & 
Opfer, 2003). 

Other researchers postulate that the pattern seen in 
Siegler’s number line task does not indicate a shift in stu- 
dents’ mental representation of a number line, but instead is 
an artifact of the task structure. Researchers with this view 
hypothesize that the upper bound on the number line places 
a limit on students’ estimates of larger target numbers and 
forces them to shift those estimates down, resulting in the 
log-linear pattern (e.g., Barth & Paladino, 2011; Cohen & 
Sarnecka, 2014). To test this theory, Cohen and Sarnecka 
(2014) examined student performance on an unbounded 
number line task which included a lower but not an upper 
bound. They found that the logarithmic to linear shift in stu- 
dent responding was not present. These researchers propose 
that the unbounded number line task captures students’ 
understanding of number magnitude, while the bounded 
task used by Siegler and colleagues is first solved through 
students’ use of measurement skills and eventually solved 
through their engagement in proportional reasoning and use 
of subtraction and division. 

While these viewpoints reflect perspectives of number 
line task performance within a developmental context, the 
different task types may have utility as screeners for stu- 
dents at-risk across mathematics achievement levels and 
implications for research into the utility of using number 
line assessments as part of screening batteries. Given the 
depth of work done on bounded number line to date, we 
used the bounded number line in this study, however future 
research could tease out how well different number line 
tasks work with students of varying skill levels and at dif- 
ferent grades. For example, if the bounded number line task 
of Siegler and colleagues taps into more advanced mathe- 
matical skills such as subtraction and division, this screener 
may be better suited to differentiate among students with 
more advanced mathematics skills and/or for use as a uni- 
versal screening at later grade levels. In contrast, the 
unbounded number line task may be more effective at dis- 
criminating between at-risk students with lower mathemat- 
ics skills and/or for use as a universal screener in early 
elementary. Given the lower performing sample in this 
study, the bounded number line task may have been too dif- 
ficult and a better number line screener would be an 
unbounded “easier” number line task. More closely align- 
ing the measure to the appropriate skill or grade level would 
help eliminate floor or ceiling effects that might occur due 
to the difficulty level of the task. Future research could con- 
trast bounded and unbounded tasks at various grades and 
with varying skill levels to help determine if certain types of 
number lines work better at different points in time with 
different populations. 
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By exploring new tasks and their value in enhancing early 
numeracy screening systems, the work summarized in this 
manuscript meets the call to advance the research on screen- 
ing systems for use in multitier service deliver frameworks 
(Methe et al., 2011). We would consider it a worthy endeavor 
to analyze number line tasks in isolation to further the field’s 
understanding of mathematical development. In addition, we 
also encourage research and researchers to analyze the num- 
ber line in the context of current screening practice. To that 
end, research investigating the value added of number line 
assessments to existing screening batteries should be con- 
ducted and findings considered within the framework of cost 
(e.g., efficiency) to benefit including key decisions related to 
classification accuracy of screeners (Clarke et al., 2011) and 
the capability to identify potential nonresponders to interven- 
tions (Compton et al., 2012). Advances in this regard will 
assist the field in furthering our understanding of mathemat- 
ics development and designing more effective MTSS frame- 
works and in doing so will, hopefully, help ensure that all 
children acquire the critical mathematics knowledge neces- 
sary for success inside and outside of school. 
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